Skip to content

[Master to feature] Merge master to feature branch#6961

Merged
minglumlu merged 129 commits intofeature/trusted-certsfrom
master
Mar 20, 2026
Merged

[Master to feature] Merge master to feature branch#6961
minglumlu merged 129 commits intofeature/trusted-certsfrom
master

Conversation

@minglumlu
Copy link
Copy Markdown
Member

No description provided.

djs55 and others added 30 commits October 29, 2014 18:35
Signed-off-by: David Scott <dave.scott@eu.citrix.com>
Signed-off-by: David Scott <dave.scott@eu.citrix.com>
Signed-off-by: David Scott <dave.scott@eu.citrix.com>
We never 'throw' exceptions between threads so we can use thread-local
backtrace tables to avoid too much contention / having to resize the
global table.

Users must wrap all their threads in a

  Backtrace.with_backtraces (fun () -> ... )

Failure to do so will generate a backtrace with an error in it (therefore
it should be obvious that this wasn't done)

Signed-off-by: David Scott <dave.scott@eu.citrix.com>
We use Hashtbl.add and then Hashtbl.remove, but always adding
a reference to the same record. We use the size of the bindings
list in the hashtable as the reference count.

Signed-off-by: David Scott <dave.scott@eu.citrix.com>
Use thread-local backtrace tables
We need a way to construct backtraces from data sent from other
languages.

Signed-off-by: David Scott <dave.scott@eu.citrix.com>
Add support for Interoperating with other languages
We take the hit of parsing the OCaml < 4.02 stacktraces safe in the
knowledge that we can optimise this away (with optcomp) later.

This also means we take full control over the rendering of stacktraces
and can make the python and OCaml backtraces look the same.

Signed-off-by: David Scott <dave.scott@eu.citrix.com>
Signed-off-by: David Scott <dave.scott@eu.citrix.com>
Store backtraces as lists of records rather than strings
All backtraces have been off by one...

0/9 xapi @ renoir Raised at file "db_rpc_client_v1.ml", line 33, characters 14-39
...
8/9 xapi @ renoir Called from file "lib/backtrace.ml", line 150, characters 17-21

Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
Correct backtrace ordering by indexing from 1
Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
Enable Travis using ocaml-travisci-skeleton
Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
Signed-off-by: Si Beaumont <simon.beaumont@citrix.com>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: David Scott <dave@recoil.org>
Signed-off-by: Marcello Seri <marcello.seri@citrix.com>
Merge pp-ely into master for moving to OCaml 4.02.3 and PPX
Run Travis build on all branches, but only upload docs on the master
branch.

Signed-off-by: Gabor Igloi <gabor.igloi@citrix.com>
Fix travis build; update opam file
Signed-off-by: Gabor Igloi <gabor.igloi@citrix.com>
edwintorok and others added 27 commits March 18, 2026 15:58
Also requires adjusting the test output, because line numbers have changed.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Instead of external.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
xe `--trace` has existed since ~2014, but it isn't documented in
`--help`, and therefore not well known.
Also it only worked on a single host, limiting its usefulness in a pool.
However propagating backtraces between XAPIs in a pool is doable, by
using the already existing `backtrace` field in the Task object.

Having working cross-host backtraces appears to have been the original
design goal in
[doc/content/design/backtraces.md](https://github.com/xapi-project/xen-api/blob/master/doc/content/design/backtraces.md)

In theory this should also work cross-language, with python SM backends,
however some plumbing is missing there: it currently doesn't work with
either SMAPIv1 or SMAPIv3. Fixing that should be the topic of another PR
(by someone else).
Generally an ejected host can be considered as a fresh-installed one.
But in practice, the update level (the hash) is useful to determine the
update state of an ejected host.

One of the cases is a host was ejected from a pool. In this case, the
retained applied update hash is useful to determine if the host can join
the pool again easily.
To make changes to the backtrace library we need to first import it into
XAPI.
Used `git subtree` to import it with its full history.

Follow-up pull-requests will update it to:
* use new functions from Printexc (introduced in 4.02 and 4.11) that
avoids parsing the strings
* deduplicate entries
* print function names and characters, not just line numbers

Eventually it could also be updated to capture a backtrace automatically
(e.g. a `Backtrace.try_with` function).

A chainbuild passed using this PR, together with another internal PR
that drops references to the external `xapi-backtrace` module from other
internal packages.
Writing code that calls XAPI functions is quite tedious, because you have to
repeat `~rpc  ~session_id` every time.

It saves quite a lot of typing to write in this style instead:

```
open Client.Client

...
  let value = call t @@ VM.maximise_memory ~self ~approximate:false ~total in
  call t @@ VM.set_memory ~self ~value
```

You still need to repeat `call t @@`, but it is at the beginning and doesn't
hinder readability.

Add new types and `val call` to Client.Client.
The type is called `client` instead of `t` because it isn't used uniformly by
other functions in this module.

No functional change to the product.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
This uses a previously unused field in the log message format to log
the Trace Context.
This include the Trace ID (common for the entire tree of activities),
and parent Span ID (unique to this instance of the remote caller).
We don't log the local span/parent ID, since this will keep changing.

Logging the traceparent could make it easier to group log messages belonging
to the same high level activity.

When an external Trace Context is not available (the default) then the log
messages are unchanged.

Another alternative would be to explicitly pass a scope/context to the logging
functions, but this would require some automated rewriting of the codebase to
plumb through the required parameters.
With the ambient context the change is much smaller, and we can still plumb
through an explicit context later if needed.

To avoid a dependency cycle this is not using Threadext, but Ambient_context
directly.

The first user of this will be the new quicktest.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
This will build upon the upstream Opentelemetry library,
so we can gradually move the existing Tracing library over.
The upstream library supports Logs and Metrics too, not just Traces.

For now this lives inside quicktest, eventually it should be moved
into our tracing library.

No functional change.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Some quicktests may run for a long time, and we don't want to run out of memory
if they keep creating events/logs/metrics on the same span.

This uses a Queue internally, so that we can drop the oldest element when full.
Could've used a ringbuffer, but that would've increased per-span memory usage a
lot.

No functional change.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
The backend is very simple, and may block the caller if the background thread
is slow due to I/O.
This is not suitable for production use, just for testing
(eventually we should use the atomic queue we have in Tracing_export *)

No functional change.

Can be imported into a local Jaeger instance like this:

```
curl -v localhost:4318/v1/traces --data-binary @trace.trace.otel -H 'Content-Type: application/x-protobuf' -o x
```

Logs and Metrics are not supported by Jaeger though, so those would have to be
imported into another tool.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Until we can upgrade to a newer version of opentelemetry which includes it.

No functional change.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Extends upstream Opentelemetry with convenience functions
to record logs and metrics associated with spans.

Implements sampling decisions.

No functional change.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
This is a parent based sampler: if the parent is sampled, then so is the
current span, otherwise it defaults to recording if a backend is registered.

This will allow implementing a tail based span processor that changes the
sampling decision when a span fails.

For now we have only 1 hardcoded sampler, eventually we might make this
configurable.

No functional change.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Wrapper around upstream Trace module using our Scope,
and with support for [result].

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
We may want to emit Opentelemetry items to multiple destinations
(console, disk, etc.).
Implement a Collector.BACKEND functor that forwards all calls to 2 other backends.

No functional change.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Currently useful for debugging how the output looks like.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Useful for quicktest_trace.

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
…ls to XAPI

Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
Signed-off-by: Edwin Török <edwin.torok@citrix.com>
…tions (#6858)

Test that we can fill a host with 1 VM, with N VMs, based on
maximise_memory/compute_memory_overhead.

Check that the constant factors used in XAPI are correct, e.g. amount of
memory used/vcpu.
Can be used to validate these PRs:
#6855
#6854

There is also a pagetable overhead calculation, but something weird is
going on there:
```
[2026-01-22T18:40:49.342348481-00:00|0000000000000000]  pagetables,memory_overhead_pages,coeff,vms
[2026-01-22T18:40:49.342333285-00:00|0000000000000000]  64,793,12.3906,9223372036854775807
[2026-01-22T18:40:49.342335974-00:00|0000000000000000]  192,1305,6.79688,9223372036854775807
[2026-01-22T18:40:49.342337658-00:00|0000000000000000]  448,2329,5.19866,9223372036854775807
[2026-01-22T18:40:49.342339751-00:00|0000000000000000]  962,4377,4.5499,9223372036854775807
[2026-01-22T18:40:49.342341392-00:00|0000000000000000]  263102,1048827,3.98639,9223372036854775807
[2026-01-22T18:40:49.342343128-00:00|0000000000000000]  526273,2097403,3.98539,9223372036854775807
[2026-01-22T18:40:49.342345071-00:00|0000000000000000]  708913,2825211,3.98527,9223372036854775807
```

That should be ~4, don't know why it'd be 13, it used to be reliably 4
previously, could be a bug in the test.
That'll need further investigation (also there is enough free memory on
the host that this underestimate doesn't actually cause a failure, which
is also unexpected).
@minglumlu minglumlu merged commit 1c49383 into feature/trusted-certs Mar 20, 2026
66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.